multi-agent search
Building AI that can master complex cooperative games with hidden information
We've built an AI bot that achieves state-of-the-art results in Hanabi, a collaborative card game that has been cited as a benchmark game for AI research because it features both cooperative gameplay and imperfect information. Our bot outperforms previous AI algorithms at Hanabi by using real-time search to fine-tune its decisions during gameplay. It's the first bot to exceed elite human performance in the game, as judged by experienced players who have evaluated it. Researchers have found it challenging to apply search beyond perfect-information games like chess and Go. Our success with Hanabi suggests search can improve more AI systems and eventually help build AI that learns to master complex cooperative tasks in real-world settings.
Facebook taught an AI the 'theory of mind'
When it comes to competitive games, AI systems have already shown they can easily mop the floor with the best humanity has to offer. But life in the real world isn't a zero sum game like poker or Starcraft and we need AI to work with us, not against us. That's why a research team from Facebook taught an AI how to play the cooperative card game Hanabi (the Japanese word for fireworks), to gain a better understanding of how humans think. Specifically, the Facebook team set out to instill upon its AI system the theory of mind. "Theory of mind is this idea of understanding the beliefs and intentions of other agents or other players or humans," Noam Brown, a researcher at Facebook AI, told Engadget.
Improving Policies via Search in Cooperative Partially Observable Games
Lerer, Adam, Hu, Hengyuan, Foerster, Jakob, Brown, Noam
Recent superhuman results in games have largely been achieved in a variety of zero-sum settings, such as Go and Poker, in which agents need to compete against others. However, just like humans, real-world AI systems have to coordinate and communicate with other agents in cooperative partially observable environments as well. These settings commonly require participants to both interpret the actions of others and to act in a way that is informative when being interpreted. Those abilities are typically summarized as theory of mind and are seen as crucial for social interactions. In this paper we propose two different search techniques that can be applied to improve an arbitrary agreed-upon policy in a cooperative partially observable game. The first one, single-agent search, effectively converts the problem into a single agent setting by making all but one of the agents play according to the agreed-upon policy. In contrast, in multi-agent search all agents carry out the same common-knowledge search procedure whenever doing so is computationally feasible, and fall back to playing according to the agreed-upon policy otherwise. We prove that these search procedures are theoretically guaranteed to at least maintain the original performance of the agreed-upon policy (up to a bounded approximation error). In the benchmark challenge problem of Hanabi, our search technique greatly improves the performance of every agent we tested and when applied to a policy trained using RL achieves a new state-of-the-art score of 24.61 / 25 in the game, compared to a previous-best of 24.08 / 25. Introduction Real-world situations such as driving require humans to coordinate with others in a partially-observable environment with limited communication. In such environments, humans have a mental model of how other agents will behave in different situations (theory of mind). This model allows them to change their beliefs about the world based on why they think an agent acted as they did, as well as predict how their own actions will affect others' future behavior. Together, these capabilities allow humans to search for a good action to take while accounting for the behavior of others.
Cooperative, Dynamics-based, and Abstraction-Guided Multi-robot Motion Planning
This paper presents an effective, cooperative, and probabilistically-complete multi-robot motion planner that enables each robot to move to a desired location while avoiding collisions with obstacles and other robots. The approach takes into account not only the geometric constraints arising from collision avoidance, but also the differential constraints imposed by the motion dynamics of each robot. This makes it possible to generate collision-free and dynamically-feasible trajectories that can be executed in the physical world.The salient aspect of the approach is the coupling of sampling-based motion planning to handle the complexity arising from the obstacles and robot dynamics with multi-agent search to find solutions over a suitable discrete abstraction. The discrete abstraction is obtained by constructing roadmaps to solve a relaxed problem that accounts for the obstacles but not the dynamics. Sampling-based motion planning expands a motion tree in the composite state space of all the robots by adding collision-free and dynamically-feasible trajectories as branches. Efficiency is obtained by using multi-agent search to find non-conflicting routes over the discrete abstraction which serve as heuristics to guide the motion-tree expansion. When little or no progress is made, the routes are penalized and the multi-agent search is invoked again to find alternative routes. This synergistic coupling makes it possible to effectively plan collision-free and dynamically-feasible motions that enable each robot to reach its goal. Experiments using vehicle models with nonlinear dynamics operating in complex environments, where cooperation among robots is required, show significant speedups over related work.